Feature Subset Selection Algorithm for High Dimensional Data using Fast Clustering Method
نویسندگان
چکیده
Feature selection means finding most useful features and it will produce suitable results among entire set of features. An algorithm is used to selecting a feature and it may be evaluated from both efficiency and effectiveness point of view. Efficiency is related to the time required to find a subset of features while the effectiveness is related to quality of subset of features. Based on these, we proposed a fast clustering-based feature selection algorithm (FAST). FAST algorithm performs in two steps. First of all, features are divided into various clusters. Then the most useful feature is selected from each cluster. We adopt the minimum spanning tree (MST) to increase the efficiency of FAST. Many useful feature selection algorithms such as FCBF, Relief, CFS, Consist, FOCUS-SF are compared to FAST algorithm.
منابع مشابه
Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملA New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum
Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...
متن کاملAlgorithm For Identifying Relevant Features Using Fast Clustering
In the high dimensional data set having features selection involves identifying a subset of the most useful features that produce compatible results as the original entire set of features. A fast algorithm may be evaluated from both the ability concerns the time required to find a subset of features and the value is required to the quality of the subset of features. Fast clustering based featur...
متن کاملFast Feature subset selection algorithm based on clustering for high dimensional data
A Feature selection algorithm employ for removing irrelevant, redundant information from the data. Amongst feature subset selection algorithm filter methods are used because of its generality and are usually good choice when numbers of features are large. In cluster analysis, graph-theoretic clustering methods to features are used. In particular, the minimum spanning tree (MST)based clustering ...
متن کاملHigh Dimensional Data Clustering Using Fast Cluster Based Feature Selection
Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of fea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014